Java implementation of the LOAD graph model
for cross document entity and event exploration
(c) 2016, Andreas Spitz, spitz@informatik.uni-heidelberg.de

Originally published at:
http://dbs.ifi.uni-heidelberg.de/index.php?id=load

A. Licensing
No license has been assigned so far. You may use this code
freely for research purposes and/or personal use. If in doubt,
please contact us.

B. Documentation
This code is a fully working yet preliminary implementation
of the LOAD model for event and entity extraction, browsing
and summarization. Since it is still in development, the
documentation is sparse in some places. Therefore, please
do not hesitate to contact us if you run into trouble.

C. Dependencies
The code requires three libraries:

1. The Snowball porter stemmer
http://snowball.tartarus.org/

2. The MongoDb Java driver v3.0 or higher
https://docs.mongodb.com/ecosystem/drivers/java/

3. The Trove collection library v3.0 or higher
http://trove.starlight-systems.com/

The libraries are enclosed in this version. However, we
recommend that you download the latest versions at the
provided URLs.

D. Contents
The implementation can be split into roughly four parts:
(1) The algorithm for creating a LOAD graph
(2) A console-based query interface
(3) Tools for exporting the graph

LOAD graphs that can be used with the interface are available
as separate downloads at
http://dbs.ifi.uni-heidelberg.de/index.php?id=load

(1) Algorithm for creating a LOAD graph
This can be found in ParallelExtractNetworkFromMongo.java
Note that this required the input data to be stored in a
MongoDB as a collection of sentences and annotations.

The collection of sentences should include
* the sentence text
* a consecutive numbering of sentences of each page
* the page ID (document ID) for each sentence

The collection of annotations should contains annotations
of type person, location, organization and date, as well as
* the sentence for each annotation
* the page to which it belongs
* begin offset
* end offset
* cover text

To change the settings for connecting to your instance of
MongoDB, please refer to SystemSettings.java

(2) The query interface uses the same graph settings as the
algorithm. The graph is loaded into memory with the exception
of the sentence collection. if support for sentences is to be
enabled, the sentence collection must be available in a MongoDB
collection.

The interface has an information function and help available
after it is started. For details on query formulation, please
refer to the original paper.

(3) A tool for importing the graph representation in a MongoDB
is available in MoveLOADNetworkToMongoDB.java. Note that this
simply stored the graph as an edge list and creates indices for
improved lookup. If necessary, the code can quickly be adjusted
to write this edge list format to any other target format.

E. References
If you use this code or approach, please consider citing us:

A. Spitz and M. Gertz
Terms over LOAD: Leveraging Named Entities for Cross-Document
Extraction and Summarization of Events.
Proceedings of the 39th International Conference on Research
and Development in Information Retrieval (SIGIR '16).
doi: 10.1145/2911451.2911529

An authors version of the article is available at
http://dbs.ifi.uni-heidelberg.de/fileadmin/Team/aspitz/
publications/Spitz_Gertz_2016_Terms_over_LOAD.pdf
